The purpose of this project is to investigate the provision of timely and effective outpatient and emergency department care in Pennsylvania. Using the CMS “Timely and Effective Care” dataset and US Census Bureau data, the goals of the project are:
+ To compare Pennsylvania emergency department access and care metrics to that of other states, as well as to national averages.
+ To approximate service areas for major hospitals in Pennsylvania and determine if timely and effective care is available for for certain domains in emergency care, specifically: stroke interventions, heart attack interventions, wait time.
+ To investigate if there are any relationships between socioeconomic and demographic variables and TEC using multivariate regressions.
According to the United Health Care Foundation’s report America’s Health Rankings in 2015, Pennsylvania fell below national averages in rankings of state population health, ranking 29th among the 50 states. Furthermore, according to CDC data published in 2014, measures of health status in Pennsylvania vary by race/ethnicity, and patterns across these measures in Pennsylvania are similar to national data. Moreover, disparities in health and health access exist across the geographic regions of the state, with Pennsylvanians living in rural communities more likely to have unmet heath needs and have poor access to health care than those in urban communities.
Access to health care means having “the timely use of personal health services to achieve the best health outcomes” (IOM, 1993). Timely and effective ambulatory care is an important cornerstone of healthcare access, and is essential for good patient outcomes and satisfaction. This is especially true in the US, where the cost of healthcare is high and ever increasing, but healthcare outcomes remain poor, especially for a developed nation. It is important that care not only be evidence-based and clinically indicated, but also timely and prompt, with delivery of the “right care at the right time.”
Ambulatory care includes primary care, outpatient care and emergency room care. Ideally, patients are able to use primary care and outpatient for usual care, and only visit the ED for emergent conditions. Unfortunately, given the cost of healthcare and differential access to quality insurance, many patients use the ED as a ready alternative to their usual source of medical care. Many studies have examined the associations between perceived timely access barriers and reported use of the ED, which have shown that the benefits of usual medical care are often limited by barriers that limit effective and timely access to such care. Addressing these barriers first involves an identification of the ways in which timely and effective care is lacking.
This is an interdisciplinary issue, requiring the input of physicians, public health advocates, epidemiologists, economists, public service advocates, etc. As such, addressing the problem of timely access to care requires a multidisciplinary approach, and aim to employ one in this project to identify region-specific factors influencing access to care, or lack thereof. I thereby hope to provide critical insight into the state of healthcare in Pennsylvania.
Using the Timely and Effective Care (TEC) data from CMS through 2020, I will compare Pennsylvania to other states and national progress on emergency department access and care. The TEC data set includes data for measures of cataract surgery outcome, colonoscopy follow-up, heart attack care, emergency department care, preventive care, pregnancy and delivery care, and cancer care. I will focus on some of the most general issues in emergency care – interventions for stroke, interventions for myocardial infarctions (MI; heart attacks), and wait time for patients in the ED.
First I will generate graphs to visualize how TEC in Pennsylvania compares to other states and to national averages. I will also compare Pennsylvania to its regional neighbors in the Northeast.
Next, I will attempt to look at more granular data for Pennsylvania and compare within state access and availability of care. The TEC data set contains hospital, state-level and national data but, unfortunately, does not contain county-level data. Using the hospital data, I will attempt to calculate service areas of emergency departments using ~2 mile radius of the ED hospital data, building in assumptions to build critical diversions and other necessary information. I will explain all assumptions made.
Finally, I will visualize socioconomic and demographic characteristics as they vary across PA by county, and then will I run multivariate regressions to determine if there are any relationships that can be deduced from the data set that correlate withexplain differences in TEC. I will explain all assumptions made.
This project was supported by Dr. Gary Weissman, MD, MSHP (Assistant Professor of Medicine and Senior Fellow Leonard Davies Institute) and Dr. John Holmes PhD, FACE, FACMI (Professor of Medical Informatics in Epidemiology, Associate Director of the Institute for Biomedical Informatics), who both provided invaluable advice and guidance for mapping the data and assumptions for service areas.
All necessary packages are installed and loaded, and necessary keys are registered.
## Loading required package: knitr
Functions necessary for later processing are created.
# Theme for maps
my_theme <- function() {
theme_minimal() +
theme(
axis.line = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_line(color = "white"),
legend.key.size = unit(0.5, "cm"),
legend.text = element_text(size = 8),
legend.title = element_text(size = 10),
plot.title = element_text(size = 14),
#hjust = 0.5),
#panel.grid.major = element_line(color = "#ebebe5", size = 0.2),
#panel.grid.minor = element_blank(),
#plot.background = element_rect(fill = "#f5f5f2", color = NA),
#panel.background = element_rect(fill = "#f5f5f2", color = NA),
#legend.background = element_rect(fill = "#f5f5f2", color = NA),
panel.border = element_blank())
}
# Min for map scales
prev_min <- function(x) {
min(c(x),na.rm=TRUE)
}
# Max for map scales
prev_max <- function(x) {
max(c(x), na.rm=TRUE)
}
# Theme for bar graphs
my_theme_bar <- function() {
theme_minimal() +
theme(
axis.line = element_blank(),
panel.grid = element_line(color = "white"),
legend.key.size = unit(0.5, "cm"),
legend.text = element_text(size = 8),
legend.title = element_text(size = 10),
plot.title = element_text(size = 14),
panel.border = element_blank())
}
# Percent function
percent <- function(x, format = "f"){
paste0(formatC(x, format = format, digits = 2), "%")
}Necessary inputs are downloaded and/or read.
+ TEC data is downloaded directly from the CMS website.
+ Counties map data from the tigris package.
+ 2019 County population data downloaded from US Census Bureau County Population Totals 2010-2019.
# TEC data
tec_hosp <- read.csv(file = "https://data.cms.gov/provider-data/sites/default/files/resources/350f34f9ef3d484925d49dfcce7a0f54_1632355520/Timely_and_Effective_Care-Hospital.csv")
tec_state <- read.csv(file = "https://data.cms.gov/provider-data/sites/default/files/resources/0b2db52e1ce72942f369cc00c00649f8_1632355520/Timely_and_Effective_Care-State.csv")
tec_natl <- read.csv(file = "https://data.cms.gov/provider-data/sites/default/files/resources/4766dbf7fd2de19618a27f5546954463_1632355520/Timely_and_Effective_Care-National.csv")
# PA counties data
pa_counties<-tigris::counties(state = "42", cb = TRUE) #FIPS 42 is Pennsylvania
# PA counties population data
pa_pop<- readxl::read_xlsx("/Users/moniquearnold/Documents/PSOM/BMIN 503/Final Project/BMIN503_Final_Project/Data/co-est2019-annres-42.xlsx",range="A6:M72",
col_names = c("County","Census","Estimate Base",
"pop_2010",
"pop_2011",
"pop_2012",
"pop_2013",
"pop_2014",
"pop_2015",
"pop_2016",
"pop_2017",
"pop_2018",
"pop_2019"))
# Cleaning data
tec_state$NAME <- state.name[match(tec_state$State,state.abb)]
tec_state$Score<- as.numeric(tec_state$Score)
pa_pop$County<- gsub('^\\.|\\.$', '', pa_pop$County)
pa_pop$NAME<- gsub(" County[,%] Pennsylvania$","", pa_pop$County)First, national data is visualized using us_states dataset provided by the spData package. Note: ‘total_pop_15’ is the numerical vector of total population in 2015. For simplicity only the contiguous United States is considered.
Metrics investigated are:
+ OP_18b: Average (median) time patients spent in the ED
+ OP_18c: Average (median) time patients spent in the ED (Psychiatric/Mental Health Patients)
+ OP_2: Percentage of outpatients with chest pain or possible heart attack who got drugs to break up blood clots within 30 minutes of arrival
+ OP_3b: Percentage of patients who came to the ED with stroke symptoms who received brain scan results within 45 minutes of arrival
+ OP_23: Average (median) number of minutes before outpatients with chest pain or possible heart attack who needed specialized care were transferred to another hospital
# Select variables to investigate
varlist <- c("OP_18b","OP_18c","OP_2","OP_23","OP_3b")
state_vars <- tec_state %>%
dplyr::select(NAME,State,Measure.ID,Score) %>%
filter(Measure.ID %in% varlist) %>%
drop_na() %>%
reshape(idvar = c("State","NAME"), timevar = "Measure.ID", direction = "wide", varying=varlist)
# Merge to usa map data
usa <- us_states
to_map_natl<- merge(usa,state_vars, stringsAsFactors=FALSE)
str(to_map_natl) #check## Classes 'sf' and 'data.frame': 48 obs. of 13 variables:
## $ NAME : chr "Alabama" "Arizona" "Arkansas" "California" ...
## $ GEOID : chr "01" "04" "05" "06" ...
## $ REGION : Factor w/ 4 levels "Norteast","Midwest",..: 3 4 3 4 4 1 3 3 3 4 ...
## $ AREA : Units: [km^2] num 133709 295281 137690 409747 269573 ...
## $ total_pop_10: num 4712651 6246816 2872684 36637290 4887061 ...
## $ total_pop_15: num 4830620 6641928 2958208 38421464 5278906 ...
## $ State : chr "AL" "AZ" "AR" "CA" ...
## $ OP_18b : num 139 174 126 163 140 163 194 152 145 129 ...
## $ OP_18c : num 213 291 201 269 234 278 300 225 289 181 ...
## $ OP_2 : num 51 69 38 48 60 33 NA 54 39 33 ...
## $ OP_23 : num 62 72 73 73 73 75 68 75 70 62 ...
## $ OP_3b : num 74 71 62 78 56 64 NA 56 72 62 ...
## $ geometry :sfc_MULTIPOLYGON of length 48; first list element: List of 1
## ..$ :List of 1
## .. ..$ : num [1:51, 1:2] -88.2 -88.2 -87.4 -86.9 -85.6 ...
## ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
## ..- attr(*, "names")= chr [1:12] "NAME" "GEOID" "REGION" "AREA" ...
to_map_northeast<- to_map_natl %>% filter(REGION=="Norteast")
to_map_pa <- to_map_natl %>% filter(NAME=="Pennsylvania")
#CANNOT GET THIS TO WORK
# for (i in varlist){
# p <- ggplot() +
# geom_sf(data = to_map_natl, aes(fill = i), lwd=0, colour="white") +
# coord_sf() +
# my_theme() +
# scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "mako", direction = -1,
# limit = range(prev_min(to_map_natl[[i]]), prev_max(to_map_natl[[i]]))) +
# labs(title= "Average (median) time patients spent in the ED",
# subtitle= "July - December 2020",
# caption = "Source: Centers for Disease Control") +
# theme(legend.position = "bottom")
# print(p)
# }
# Plot maps
#OP_18b
map_natl_OP_18b<-ggplot() +
geom_sf(data = to_map_natl, aes(fill = OP_18b), lwd=0, colour="white") +
geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
coord_sf() +
my_theme() +
scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "mako", direction = -1,
limit = range(prev_min(to_map_natl$OP_18b), prev_max(to_map_natl$OP_18b))) +
labs(title= "Fig. 1: Average (median) time patients spent in the ED",
subtitle= "July - December 2020",
caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
geom_text(aes(x = -85, y = 35, hjust="left",
label = "Pennslvania median = 156 mins \nUS national median = 148 mins"), stat = "unique")
# map_natl_OP_18b_NE<-ggplot(to_map_northeast) +
# geom_sf(aes(fill = OP_18b), lwd=0, colour="white") +
# geom_sf_label(aes(label = State)) +
# coord_sf() +
# my_theme() +
# scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "mako", direction = -1,
# limit = range(prev_min(to_map_natl$OP_18b), prev_max(to_map_natl$OP_18b))) +
# labs(subtitle= "Northeast") +
# theme(legend.position = "none")
#
# lay <- rbind(c(1,1,1,NA),
# c(1,1,1,2),
# c(1,1,1,2))
# grid.arrange(map_natl_OP_18b, map_natl_OP_18b_NE, layout_matrix = lay)
map_natl_OP_18c<-ggplot() +
geom_sf(data = to_map_natl, aes(fill = OP_18c), lwd=0) +
geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
coord_sf() +
my_theme() +
scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "mako", direction = -1,
limit = range(prev_min(to_map_natl$OP_18c), prev_max(to_map_natl$OP_18c))) +
labs(title= "Fig. 2: Average (median) time patients spent in the ED \n(Psychiatric/Mental Health Patients)",
subtitle= "July - December 2020",
caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
geom_text(aes(x = -85, y = 35, hjust="left",
label = "Pennslyvania median = 263 mins \nUS national median = 248 mins"), stat = "unique")
map_natl_OP_2<-ggplot() +
geom_sf(data = to_map_natl, aes(fill = OP_2), lwd=0) +
geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
coord_sf() +
my_theme() +
scale_fill_viridis(discrete=FALSE, name = "% of patients", option = "magma", direction = -1,
limit = range(prev_min(to_map_natl$OP_2), prev_max(to_map_natl$OP_2))) +
labs(title= "Fig. 3: Percentage of outpatients with chest pain or possible heart \nattack who got drugs to break up blood clots within 30 minutes of arrival",
subtitle= "July - December 2020",
caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
geom_text(aes(x = -85, y = 35, hjust="left",
label = "Pennslyvania median = 32% \nUS national median = 52%"), stat = "unique")
map_natl_OP_23<-ggplot() +
geom_sf(data = to_map_natl, aes(fill = OP_23), lwd=0) +
geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
coord_sf() +
my_theme() +
scale_fill_viridis(discrete=FALSE, name = "% of patients", option = "cividis", direction = 1,
limit = range(prev_min(to_map_natl$OP_23), prev_max(to_map_natl$OP_23))) +
labs(title= "Fig. 4: Percentage of patients who came to the ED with stroke symptoms \nwho received brain scan results within 45 minutes of arrival",
subtitle= "July - December 2020",
caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
geom_text(aes(x = -85, y = 35, hjust="left",
label = "Pennslyvania median = 65% \nUS national median = 72%"), stat = "unique")
map_natl_OP_3b<-ggplot() +
geom_sf(data = to_map_natl, aes(fill = OP_3b), lwd=0) +
geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
coord_sf() +
my_theme() +
scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "magma", direction = -1,
limit = range(prev_min(to_map_natl$OP_3b), prev_max(to_map_natl$OP_3b))) +
labs(title= "Fig. 5: Average (median) number of minutes before outpatients with chest pain \nor possible heart attack who needed specialized care were transferred to \nanother hospital",
subtitle= "July - December 2020",
caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
geom_text(aes(x = -85, y = 35, hjust="left",
label = "Pennslyvania median = 76 mins \nUS national median = 61 mins"), stat = "unique")
map_natl_OP_18bmap_natl_OP_18cmap_natl_OP_2map_natl_OP_23map_natl_OP_3bNext, median metrics of Pennsylvania are compared to those of other states in the Northeast.
Note: Since only the final median values and not the underlying data, I cannot perform any statistical comparisons.
# Determine averages
fg_northeast<- as.data.frame(to_map_natl) %>%
filter(REGION=="Norteast") %>%
dplyr::select(NAME,OP_18b,OP_18c,OP_2,OP_23,OP_3b)
# Reshape for barchart
forgraph_northeast <- melt(setDT(fg_northeast), id.vars = c("NAME"), variable.name = "score")
# fg_ne_sum<-summarize_all(fg_northeast,mean,na.rm = TRUE) %>% mutate(NAME="Northeast States")
#
# fg_natl <- tec_natl %>%
# dplyr::select(Measure.ID,Score) %>%
# mutate(NAME="US National") %>%
# filter(Measure.ID %in% varlist) %>%
# reshape(idvar="NAME", timevar = "Measure.ID", direction = "wide", varying=varlist)
#forgraph_all <- rbind(fg_northeast,fg_ne_sum)
# Plot graphs
bar_OP_18b18c <- forgraph_northeast %>%
filter(variable=="OP_18b" | variable=="OP_18c") %>%
ggplot(aes(x=reorder(factor(NAME),value), y=value, fill=variable)) +
geom_bar(stat="identity", colour = "black", position = "stack") +
my_theme_bar() +
theme(axis.title.y=element_blank()) +
geom_text(aes(label=value), vjust=1, color="white", size=3.5) +
scale_y_continuous(name="Median Time (mins)") +
coord_flip() +
labs(title= "Fig. 6: Average (median) time patients spent in the ED",
subtitle= "July - December 2020",
caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
scale_fill_brewer(palette = "Set1",
breaks=c("OP_18b", "OP_18c"),
labels=c("All Patients", "Psychiatric/Mental Health Patients")) +
theme(legend.title=element_blank())
bar_OP_2_23 <- forgraph_northeast %>%
filter(variable=="OP_2" | variable=="OP_23") %>%
ggplot(aes(x=reorder(factor(NAME),value), y=value, fill=variable)) +
geom_bar(stat="identity", colour = "black", position = "stack") +
my_theme_bar() +
theme(axis.title.x=element_blank()) +
geom_text(aes(label=value), vjust=1, color="white", size=3.5) +
scale_y_continuous(name="% of patients") +
labs(title="Fig. 7: Percent of ED Patients Presenting with Acute Symptoms who \nReceived Timely Care ",
subtitle= "July - December 2020",
caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
scale_fill_brewer(palette = "Accent",
breaks=c("OP_2", "OP_23"),
labels=c("% pts with chest pain or possible heart attack \nwho got drugs to break up blood clots within 30 minutes of arrival", "% pts with stroke symptoms who received \nbrain scan results within 45 minutes of arrival")) +
theme(legend.title=element_blank(), legend.position="bottom")
bar_OP_18b18cbar_OP_2_23A brief look at the wait times for the hospitals.
# Subset data to metrics of interest and PA
df_hosp <- tec_hosp %>% filter(Measure.ID %in% varlist & State=="PA")
## General hospitals
# Get list of general hospital names
df_hosp_fac <- df_hosp %>%
mutate(addr=paste0(Address,", ",City, ", ",State," ",ZIP.Code)) %>%
distinct(Facility.Name,addr,Measure.ID,Score) %>%
arrange(Facility.Name) %>%
reshape(idvar = c("Facility.Name","addr"), timevar = "Measure.ID", direction = "wide", varying=varlist)
df_hosp_fac$OP_18b <- as.numeric(df_hosp_fac$OP_18b)
df_hosp_fac$OP_18c <- as.numeric(df_hosp_fac$OP_18c)
df_hosp_fac$OP_2 <- as.numeric(df_hosp_fac$OP_2)
df_hosp_fac$OP_23 <- as.numeric(df_hosp_fac$OP_23)
df_hosp_fac$OP_3b <- as.numeric(df_hosp_fac$OP_3b)
df_hosp_fac$addr <- gsub('34TH ST &', '3400', df_hosp_fac$addr)
df_hosp_fac$addr <- gsub('34TH &', '3400', df_hosp_fac$addr)
# Format table
dt_pahosp <- df_hosp_fac %>% dplyr::select(Facility.Name,OP_18b,OP_18c) %>% filter(! is.na(OP_18b) | ! is.na(OP_18c))
dt_pahosp$Facility.Name<-str_to_title(dt_pahosp$Facility.Name)
dt_pahosp$OP_18b_time <- dt_pahosp$OP_18b %>% hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")
dt_pahosp$OP_18c_time <- dt_pahosp$OP_18c %>% hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")
dt_pahosp <- dt_pahosp %>% dplyr::rename('Hospital'=Facility.Name,
'Time Spent in ED - All Patients'=OP_18b_time,
'Time Spent in ED - Mental Health Patients'=OP_18c_time)
# Calculate means
dt_pahosp_means <- dt_pahosp %>% dplyr::summarize(mean_wait_all=mean(OP_18b, na.rm = TRUE),
mean_wait_bhv=mean(OP_18c, na.rm = TRUE))
dt_pahosp_means$`All Patients` <- dt_pahosp_means$mean_wait_all %>%
hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")
dt_pahosp_means$`Mental Health Patients` <- dt_pahosp_means$mean_wait_bhv %>%
hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")
dt_pahosp_means <- dt_pahosp_means %>% dplyr::select(-mean_wait_all,-mean_wait_bhv) %>% gather()
# Final tables
head(dt_pahosp[order(dt_pahosp$`Time Spent in ED - All Patients`,decreasing=TRUE),],n=10) %>%
dplyr::select(-OP_18b,-OP_18c) %>%
kable(caption="Table 1. Longest Patient Wait Times at PA Hospitals (Top 10)",
align="lcc",
row.names = FALSE) %>%
kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)| Hospital | Time Spent in ED - All Patients | Time Spent in ED - Mental Health Patients |
|---|---|---|
| Jefferson Health- Northeast | 06:26 | NA |
| Milton S Hershey Medical Center | 04:54 | 08:24 |
| Hospital Of Univ Of Pennsylvania | 04:11 | 04:44 |
| Chester County Hospital | 04:05 | 10:08 |
| Penn Presbyterian Medical Center | 03:57 | 04:34 |
| Robert Packer Hospital | 03:44 | 06:07 |
| Upmc Memorial | 03:44 | 04:24 |
| Penn State Health St. Joseph | 03:42 | 04:53 |
| Main Line Hospital Lankenau | 03:38 | NA |
| Einstein Medical Center Montgomery | 03:37 | NA |
tail(dt_pahosp[order(dt_pahosp$`Time Spent in ED - All Patients`,decreasing=TRUE),],n=10) %>%
dplyr::select(-OP_18b,-OP_18c) %>%
kable(caption="Table 2. Shortest Patient Wait Times at PA Hospitals (Bottom 10)",
align="lcc",
row.names = FALSE) %>%
kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)| Hospital | Time Spent in ED - All Patients | Time Spent in ED - Mental Health Patients |
|---|---|---|
| Millcreek Community Hospital | 01:36 | 03:28 |
| Ahn Emerus Westmoreland, Llc | 01:31 | NA |
| Penn Highlands Tyrone | 01:31 | NA |
| Highlands Hospital | 01:30 | 04:05 |
| Upmc Kane | 01:30 | 03:02 |
| Mercy Catholic Medical Center- Mercy Fitzgerald | 01:28 | 04:44 |
| Lecom Health Corry Memorial Hospital | 01:24 | NA |
| Grove City Medical Center | 01:22 | NA |
| Bucktail Medical Center | 01:19 | NA |
| Conemaugh Meyersdale Medical Center | 01:08 | NA |
dt_pahosp_means %>%
kable(caption="Table 3. Mean Patient Wait Times at PA Hospitals ",
align="lr",
row.names = FALSE,
col.names=NULL) %>%
kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)| All Patients | 02:40 |
| Mental Health Patients | 05:03 |
Next, I explore the availability of hospitals across Pennsylvania. I will map hospital densities using the hospital data from TEC.I will map all hospitals, then hospitals across a few key service lines, namely: pediatric hospitals, trauma centers and stroke centers.
Using the package ggmap, I will get latitude and longitude data for each hospital address in the TEC data using the Google API. To do this, I registered for a free Google console account, generated an API key that I registered in R Studio, and enabled the following APIs on the Google Console:
+ Directions API
+ Geocoding API
+ Geolocation API
+ Places API
+ Maps Javascript AVI
+ Maps Embed API
I then converted the data frame of resultant cartesion coordinates to an sf object. I used the the projection WGS84, which is the CRS code #4326, a priori defined in the sf object. I mapped the hospitals onto a map of Pennsylvania using the counties dataset tigris library, overlayed with county population data.
## All hospitals
# getting map coordinates (lat and long) of hospital addresses
for(i in 1:nrow(df_hosp_fac))
{
ifelse(is.na(df_hosp_fac$addr[i]), NA,
hosp_ggmap_pa <- geocode(df_hosp_fac$addr[i],
output = "more",
source = "google",
messaging = FALSE))
df_hosp_fac$Longitude[i] <- as.numeric(hosp_ggmap_pa[1])
df_hosp_fac$Latitude[i] <- as.numeric(hosp_ggmap_pa[2])
}
df_hosp_fac_tib <- as_tibble(df_hosp_fac)
df_hosp_fac_sf <- st_as_sf(df_hosp_fac_tib, coords = c("Longitude", "Latitude"), crs = 4326)
mapview(df_hosp_fac_sf) #Quickly checking addresses geocoded correctly # merge to PA county data to PA population data
pa_tomap<-merge(pa_counties, pa_pop, by="NAME")
class(pa_tomap) #check## [1] "sf" "data.frame"
## Specialty hospitals
varlist_hosp <- c("Childrens","Stroke","Trauma")
for (var in varlist_hosp){
# Initialize table
tablet <- as.character(paste0("list_pa_",var))
# Import
list <- readxl::read_xlsx("/Users/moniquearnold/Documents/PSOM/BMIN 503/Final Project/BMIN503_Final_Project/Data/Specialty Hospitals.xlsx", sheet=var)
# Getting map coordinates (lat and long) of specialty hospital addresses
list2 <- cbind(list,
type_hosp=var,
geocode(list$Hospital,
output = "more",
source = "google",
messaging = FALSE))
list_tib <- as_tibble(list2)
list_sf <- st_as_sf(list_tib, coords = c("lon", "lat"), crs = 4326)
# Assign final table
assign(tablet,list_sf)
}
# checking addresses geocoded correctly
# mapview(list_pa_Childrens)
# mapview(list_pa_Stroke)
# mapview(list_pa_Trauma)
spec_sf <- st_as_sf(rbind.fill(list_pa_Childrens,list_pa_Stroke,list_pa_Trauma)) %>% dplyr::select(-'...3')
class(spec_sf)## [1] "sf" "data.frame"
# Creating annotations for graphs
high_pa_tomap <- pa_tomap %>% filter(pop_2019 > 1000000) #labelling counties with pop > 1 million
for(i in 1:nrow(high_pa_tomap)){
ifelse(is.na(high_pa_tomap$County[i]), NA,
high_pa_tomap2 <- geocode(high_pa_tomap$County[i], output = "latlon", source = "google", messaging = FALSE))
high_pa_tomap$lon[i] <- as.numeric(high_pa_tomap2[1])
high_pa_tomap$lat[i] <- as.numeric(high_pa_tomap2[2])
}
all_hosp_sf <- st_as_sf(rbind.fill(spec_sf,df_hosp_fac_sf) %>%
mutate(type_hosp = ifelse(is.na(type_hosp),"All Hospitals",type_hosp)))
# Map
ggplot() +
geom_sf(data = pa_tomap, aes(fill = pop_2019/100000), color="white") +
geom_sf(data = all_hosp_sf, size = 2, shape = 21, fill = "gold") +
geom_sf_label_repel(data = high_pa_tomap, aes(x=lon,y=lat,label=NAME),
size=3, force=7, nudge_y=-5, nudge_x=1, seed = 10,
alpha=0.8) +
facet_wrap(~type_hosp, shrink=FALSE, dir="h") +
coord_sf() +
my_theme() +
scale_fill_viridis(discrete=FALSE, name = "Population (in 100,000s)", option = "mako", direction = -1,
limit = range(prev_min(pa_tomap$pop_2019/100000),
prev_max(pa_tomap$pop_2019/100000))) +
labs(title= "Fig. 8: Pennsylvania Specialty Service Line Hospitals",
caption = "Source: Centers for Disease Control, US Census Bureau (2019 Census). \nHighlighting counties with population of at least 1 million.") +
theme(legend.justification = c(0, 1),legend.position = "bottom")Next, I will investigate different key demographic factors on average and by county in PA using the tidycensus package and 5 year ACS data ending in 2019. Key factors investigated are:
+ Poverty rate (DP03_0119PE)
+ Median household income (DP03_0062E)
+ Percentage population 25 years and over with high school grad or higher (DP02_0066PE)
+ Unemployment rate (DP03_0009PE)
+ Percentage population without health insurance coverage (DP03_0099PE)
+ Percentage population with public health insurance coverage (DP03_0098PE)
+ Percentage population with private health insurance (DP03_0097PE)
+ Percentage of population White (DP05_0037PE)
+ Percentage of population Asian (DP05_0044PE)
+ Percentage of population Black or African American (DP05_0038PE)
+ Percentage of population Hispanic or Latino (DP05_0078PE)
+ Percentage of population American Indian and Alaska Native (DP05_0039PE)
# Obtaining 2019 acs5 variable list to investigate
dp17 <- load_variables(2019, "acs5/profile", cache = TRUE)
pa_factors <- get_acs(geography = "county",
variables = c(povertyrate = "DP03_0119P",
medincome = "DP03_0062",
edurate = "DP02_0066P",
unemploymentrate = "DP03_0009P",
nohealthins = "DP03_0099P",
pubhealthins = "DP03_0098P",
privhealthins = "DP03_0097P",
racewhite = "DP05_0037P",
raceasian = "DP05_0044P",
raceblack = "DP05_0038P",
racehispanic = "DP05_0078P",
raceamerind = "DP05_0039P"),
state = 42, #PA is 42
year = 2019,
geometry=TRUE)
unique(pa_factors$variable)## [1] "edurate" "unemploymentrate" "medincome" "privhealthins" "pubhealthins" "nohealthins" "povertyrate" "racewhite" "raceblack" "raceamerind" "raceasian" "racehispanic"
tableout <- data.frame(cbind(unique(pa_factors$variable),
c("Percentage population 25 years and over with high school grad or higher",
"Unemployment rate (%)",
"Median household income in millions of USD",
"Percentage of population with private health insurance",
"Percentage of population with public health insurance coverage",
"Percentage of population without health insurance coverage ",
"Poverty Rate (%)",
"Percentage of population White",
"Percentage of population Black or African American",
"Percentage of population American Indian and Alaska Native",
"Percentage of population Asian",
"Percentage of population Hispanic or Latino")))
table_soc <- data.frame(pa_factors) %>%
group_by(variable) %>%
dplyr::select(NAME,variable, estimate) %>%
mutate(estimate = as.numeric(estimate),
variable = as.character(variable)) %>%
dplyr::summarize(mean_estimate=mean(estimate, na.rm = TRUE)) %>%
mutate(mean_estimate = ifelse(variable %in% c("medincome"),
scales::dollar(mean_estimate),
percent(mean_estimate))) %>%
left_join(tableout,by=c("variable"="X1")) %>%
dplyr::rename(`Socioeconomic/Demographic Factor`=X2,
`Mean Value`=mean_estimate)
table_soc %>% dplyr::select(`Socioeconomic/Demographic Factor`,`Mean Value`) %>%
kable(caption="Table 4. Socioeconomic Factors -- Averages across PA Counties",
align="lr",
digits = 3,
col.names=c("Factor","Mean Value")) %>%
kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)| Factor | Mean Value |
|---|---|
| Percentage population 25 years and over with high school grad or higher | 8.90% |
| Median household income in millions of USD | $57,056.09 |
| Percentage of population without health insurance coverage | 5.91% |
| Poverty Rate (%) | 8.31% |
| Percentage of population with private health insurance | 72.23% |
| Percentage of population with public health insurance coverage | 39.20% |
| Percentage of population American Indian and Alaska Native | 0.15% |
| Percentage of population Asian | 1.51% |
| Percentage of population Black or African American | 4.74% |
| Percentage of population Hispanic or Latino | 4.44% |
| Percentage of population White | 90.56% |
| Unemployment rate (%) | 5.04% |
#pack_rows("Education factors", 1,1) %>%
#ORGANIZE BY CATEGORIES: education, economics, health insurance access, race# Mapping variables
counter = 0
for(i in unique(pa_factors$variable)){
# dynamic figure labelling
counter <- counter + 1
figletter <- toupper(letters[counter])
# dynamic titling
tableout2 <- tableout %>% filter(X1==i)
titleout <- as.character(paste0("Fig. 9",figletter,". Pennsylvania ",tableout2$X2," by county"))
# dynamic annotations
anno_soc <- table_soc %>% filter(variable==i)
meanlabel <-
paste0("Calculated Estimated Mean in PA: ",anno_soc$`Mean Value`)
# subset data for variable of interest
spec_table <- pa_factors %>% filter(variable==i)
var_pa_tomap <- spec_table %>%
filter(NAME=="Allegheny County, Pennsylvania" | NAME=="Philadelphia County, Pennsylvania") %>%
mutate(label=paste0(gsub(' County, Pennsylvania','',NAME),": ",estimate)) %>%
st_join(high_pa_tomap)
# final maps
demo_map <- ggplot() +
geom_sf(data = spec_table, aes(fill = estimate), color="white") +
# geom_sf(data = all_hosp_sf, size = 2, shape = 21, fill = "gold") +
geom_sf_label_repel(data = var_pa_tomap, aes(x=lon,y=lat,label=label),
size=3, force=7, nudge_y=-5, nudge_x=1, seed = 10,
fill = alpha(c("white"),0.8)) +
coord_sf() +
my_theme() +
scale_fill_viridis(discrete=FALSE, option = "rocket", direction = -1,
limit = range(prev_min(spec_table$estimate),
prev_max(spec_table$estimate))) +
labs(title = paste0(strwrap(titleout,width=50),collapse="\n"),
caption = "Source: US Census Bureau (2019 Census).",
subtitle = paste0(strwrap(meanlabel,width=50),collapse="\n")) +
theme(legend.justification = c(0, 1),
legend.position = "bottom")
# plot.tag.position = c(),
# plot.tag = element_text(vjust = 1, hjust = 1, size=8))
print(demo_map)
}#paste0(strwrap(titleout,width=50),collapse="\n")Next, I run multivariate analyses to see if there are any relations between the socioeconomic/demographic variables and the TEC data.
#Regressions of wate time vs. the characteristics
# summary(glm((as.numeric(birthwt.clean$low.birth.wt)-1) ~ age, data = birthwt.clean, family = binomial()))As shown in the above maps, patients in Pennsylvania, on average, spend more time waiting in the ED than the national average (Figures 1 and 2). Additionally, patients who go to the ED with signs of a possible heart attack or stroke, on average, were received timely care less often (Figures 3 and 4). Patients with chest pain in PA had to wait longer before transfer to a specialty hospital than the US national average (Figure 5).
Though PA underperforms nationally, the regional graphs shows that PA performs the second best in the Northeast region (behind Vermont) with respect to general ED wait time (Fig. 6), but on par with the region for timely care for acute life-threatening symptoms (chest pain, stroke symptoms, Fig. 7). problem solving.
table 1-3 table 4
As expected, the largest number of hospitals are in the most densely populated counties (Fig. 8). fig 9, figs 10
Limitations: 1. Assumption that only pediatric hospitals see pediatric patients, which is not true. This underestimates access to pediatric care, but still exists as a proxy for access to high quality pediatric care.
I would like to thank Dr. Gary Weissman and Dr. John Holmes for their guidance as described above. I would like to thank my classmates in BMIN 503 for providing inspiration as well as valuable peer-peer
https://www.ahrq.gov/research/findings/nhqrdr/chartbooks/access/elements.html https://books.google.com/books?hl=en&lr=&id=81bSBQAAQBAJ&oi=fnd&pg=PA119&dq=timely+and+effective+care&ots=_mCRAGdZHY&sig=2GyvgJgKH4jEvV7_XkF1H6pNQPE#v=onepage&q=timely%20and%20effective%20care&f=false https://jamanetwork.com/journals/jamainternalmedicine/article-abstract/770345 https://www.cdc.gov/nchs/fastats/emergency-department.htm https://www.kff.org/health-reform/fact-sheet/the-pennsylvania-health-care-landscape/